Large Language Models Miss the Multi-Agent Mark
Why Current MAS LLMs Research Falls Short of Classical Multi-Agent Systems Principles
Authors: E. La Malfa et al.
Published on Arxiv: 2025-05-27
Link: http://arxiv.org/abs/2505.21298v1
Institutions: Department of Computer Science, University of Oxford • Department of Informatics, King’s College London • Department of Engineering, University of Oxford • University of Sussex
Keywords: Multi-Agent Systems, Large Language Models, Emergent Behavior, Agent Communication Languages, Natural Language Communication, Asynchronous Systems, Theory of Mind, Social Intelligence, Reinforcement Learning, Environment Design, Benchmarks, Communication Protocols
Recent research efforts have focused on Multi-Agent Systems composed of Large Language Models (MAS LLMs), aiming to solve complex problems by coordinating several LLMs. However, the implementations often draw on terminology from classical MAS without faithfully reproducing fundamental MAS principles like social agency, advanced environment design, or rigorous measures of emergent behavior.
To address these observed discrepancies and advance the state of MAS LLMs, this position paper offers several key contributions:
- Provides a systematic critique of recent MAS LLMs literature, revealing mismatches with established MAS theory.
- Analyzes four core aspects: social agency, environment design, coordination/communication protocols, and emergent behavior specific to MAS LLMs.
- Categorizes over 100 papers, examining benchmarks, communication practices, and identifying gaps in current approaches.
- Proposes actionable research directions: integrating MAS concepts, enhancing environment representations, developing multimodal and asynchronous frameworks, and creating precise benchmarks for emergence.
Following this detailed analysis, the authors highlight critical weaknesses in existing MAS LLM frameworks:
- Demonstrates current systems often lack true multi-agent characteristics, instead relying on centralized or ensemble-based LLM architectures instead of authentic agentic interaction.
- Notes experimental weaknesses, such as insufficient social pre-training, weak theory of mind reasoning, poor support for asynchronous interaction, and an overdependence on natural language for communication.
- Reviews benchmarks and datasets, finding a focus on simple, textual, partially observable environments and a lack of standardized metrics to capture emergent behaviors.
In conclusion, the paper synthesizes its critique and recommendations as follows:
- Current MAS LLMs frameworks often miss the autonomy, social interaction, and environment structuring that define multi-agent systems.
- The field should adopt more precise terminology, incorporate MAS principles, build asynchronous-native frameworks, design robust environments, and implement rigorous quantitative measures for emergence.
- Future work should involve LLMs pre-trained for cooperation and competition, environments with multimodality and structure, implementation of standard asynchronous protocols, and the development of clear definitions and metrics for emergent multi-agent behavior.